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A6STBACT 

The purpose of this investigation vas to establish 
the effects of repeaters on test equating. Since consideration was 
not given to repeaters in test equating,^ such as in the derivation of 
equations by Angoff (1971) » the hypothetix:al effect needed to be * 
established, A case study was exaiined which showed results on a test 
as expected; overall lean was lower for repeaters, Applying these 
data to tlie available equating equations, it was shown that an - 
additiOnar 3 percent of the exaiinees was categorizea as having 
"passed" .than should if repeaters were, taken into account. The \ 
prkcticai solution offered is to hold separate the score: of 
repeaters, execute the equating on theothers, and then apply the 
conversion to all the exaiinees. (Author) 
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Introduction 

Test fairness Implies that successive forms of a test are equiva- 
lent In all important respects. "However, since the forms cannot be 
precisely equivalent, it becomes necessary to equate the forms — to 
convert the system of units of one form to the system of units of the 
other — so that scores derived from the two forms after conversion wlir 
be directly equivx^ent."^ 

A classificatiott of the various methods for equating test forms, 
particularly considering the kind of group used and. test reliability. Is 
presented by Angoff (1971)* In this general presentation, while the 
equating methods are categorized by either assuming random groups or 
nonrandom groups (e.g.*» groups widely different in abii.ity) , no atten- 
tion is given to the circumstance where Individuals in a group are 
takings the test for a second time. 

This paper is concerned with the effects of repeaters — those Who 
take a teat for a second time — on the conversion scores that result 
after test equating. These effects are examined, in particular, for 
the design assuming two equally reliable tests are administered to rmi- 
dbm groups with each test including a set of common equating items 
(Cureton and Tuk^r, 1951; Levine, 1955; Angoff , 1961;.Lennon, 1964; 
-^goff, 1971, pp. 576 - 579). 
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Method 

A practical method of equating and calibrating test scores Involves 
the use of those Items common to both forms of the tests (Angoff, 1971) • 
Thls.score, U» reflects an individual's performance on those. Items common 
to Form X (administered to Group a) and to Form Y (administered to 
Group 8) . 

Lord (1955) has developed equations in f4ilch he makes maxiaatmi 
likelihood estimates of the population means and variances on X and Y* 
By definition, linear equating states that, scores on two tests are 
equivalent if they correspond to equal standard-score deylates, 
Xt) ^ -iiy " X - Mx 

" V . Sy ; Sx ^ [ , ^ / / ; 

When the terms are appropriately rearranged, equation 1 takes the Form Y « 
A>: + B where A « Sy/Sx and B = Ify - AMx, A being the slope of the conversion 
liney and B the Intercept (the point on the Y axis where it is Intersected 
by the iconverslon llne)._ 

The test equations presented In this paper correspond to the type 
of conversion expressed in the form of a straight line. That is, it is 
reasonable to assume that, by definition, Successive forms of a test are 
constructed to be nearly equivalent in all the important respects and the 
conversion of X scores to Y scores can be accomplished simply by changing 
the origin and unit of measurement • 

The equatlo^'B appropriate to a random administration of X and Y with 
U administered to all examinees are as follows : 
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(2) 'm - Mx + b (<^ - Mu ),. 

X .a xu^ u a . • " . 

<3) fy-%>;'y„3 * -«°e'' 

(4) - > V * ' («-„' -^V). 

_ P 

where "(1^ « Mu^ and « Su^^, and t « a + B. these estimates are applied 
to equation 1, to form the conversion equation Y = AX + B where A = 

'S;/^ and B = li - Aftx. 

y * y , 

with reference to the derived equations, If Groups qt and B are 
Identical In their mean performance pa test score U9 then the values of y 
the parenthetical terms in equations 2. and 3 are found to be zero. In ' 
other words, the best population estimate of mean scores on Forms X and :Y 
~±B the mean that was actually observed for Groups and 3 9 making group 
adjustments unnecessary. Similarly, this holds true for equations 4 and 5. 

On the other hand, If Groups a and 3 are not Identical In their mean ^ 
performance on test score U, then the equating procedure does make group _ 
adjustments necessary. Ordinarily the adjustments simply reflect simipllng 
differences In the groups which are chosen at random. While repeaters in 
the testing may be xioted, the adjustments are not meant to reflect their 
presence. 

Memory effects from the first administration of a test (item) will 
affect the result of the second if the same test (item) is adbinlstered 
on two successive occasions. The individuals need only remember the 
response given on the first occasion and make the same response on the 
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second 9 in order to obtain coiq)lete agreanent between the results of the 
two measurements. That is, an agreement is obtained which affects the 
correlation between repeated measurements but which is not an expression - 
of the method's reliability. That component of the score obtained on 
the first occasion which reappears on the second occasion will In parr 
do so 9 not because the tests measure the same true score, but as the 
result of memory* 

It, Is clear that where a test Is being administered for the first 
time to a group of examinees the problon of repeaters does not exist* 
However, in a second testing the examinees typically Include repeaters 
Who ; have scored relatively low the first time. Since the repeaters are 
not randomly distributed score-wise, as in the original sample, systematic 
bias is introduced* The effect is probably more acute where there was 
^ cut--of f score separating those irtio had "failed" from those i^o had 
"passed." ' 

With reference to the derived equation, if at the first administra- 
tion (Form Y), Group B has no repeaters , Mllp is found. At the second ad- 
ministration (Form X) , Group a haying repeaters, Mu^ would tend to be 
depressed in value. Consequently, thfe group adjustment for Group a will 
be upward, unfairly favoring the repeaters along with those taking that 
test concurrently. 

It should be obvious that the effect is stronger with an Increase in 
the number of repeaters. Furthermore, in practice, the number of repeaters 
in a given test administration Is ignored so that the strength of the ef- 
fect is an unknown. Besides this, the score distribution of the total group 



ifould tend to be skewed positively due to the presence of repeaters and 
hence make the linear approximation for conversion questionable. 

In the above discussion it was suggested that the mean of repeaters 
would tend to be lower than the rest of the group. On the other hand, 
if other influences are operating, sudi as practice effects and recall of 
U items, tlie mean of repeaters might possibly tend to be higher than the 
rest of the group. In order to establish the resultant direction of the 
"repeater" effect, if any, the following study is presented. 

Empirical Results 
Two examinations were administered to groups of adults studying 
for the Chartered Life Underwriter (CLU) designation at The Ajiericmi Col- 
leg^ of Life Underwriters in 1973. The first exandjiation was administered 
in January of 1973 and a second, parallel examination was aiainistered In 
June, 1973.2 Each examination consisted of 100 items, 20 of which (Fona U) 
were common to both. The descriptive statistics for the non-repeaters 
(NR) taking the examination in January^ the repeaters (R) , taking the ex^- 
inatlon In June, and the combined (C) groups are presented in Table 1. 

: INSERT TABLE 1 HERE ' 

^ Assuming that the groups taking the examinations in January and June 
are basically equivalent, the difference in the mean values of the common 
items, Uy vs. Ux, should be merely due to sampling error. t-test for 
imcorrelated means for unequal sample size was performed which did not 



support this assumption (Uy « A3.9? and Ux » 42.69, t = 4.66, d£ « 433, 
p < ;01). Based on this finding, it would appear that the groups tak- 
ing the examinations in January and June are not random sample distribu- 
tions from ah underlying population distribution. 

However, the honrandom effect under investigation here has to do witK 
repeaters present In i:he second administration. of the examination. First 
of all, a t-test was performed with respect to the January examinees 
(Form Y, initially all are nonrepeaters) , and those nonrepeaters of the 
June examinees (Form X); i.e., vs. F (NR). A statistically signifi- 
cant result was found (Ug (NR) « 43.99 W y^:(N^^^^^ 2.57, 
df - 423, p < .05). Although restricting the comparis^^ of the groups to 
the nonrepeaters ^ it appears that there is a sighif leant difference albeit 
the level of significance moves from p < .01 to p <'.05. 

/However, of direct interest in this investigation is the result of 
the t-test performed with respect to the repeaters (R) and ncmrepeaters 
(NR) present in the June administration. Interestingly enough, while the 
greatest mean difference is observed for these groups (Ux (NR) - Ux XR) « 
1.57 > Uy - Ux (combined) " 1.30> Uy - Ux (NR) « 1.12) , no statistical 
significance was found (t « 1.57, df « 170i p > .05) . This finding makes 
sense if one notes the small sample size of the repeaters (N » 20) along 
with the low observed mean Ux (R)^ 41.30. 

In other words, while the repeaters tend to introduce a systematic 
bl^ in the equating process, the small sample size masks the effect 
' and traps those who attempt equating examinations into ignoring the in- 
fluence. Nevertheless, while statistical significance was not found 



with respect to repeaters and nonrepeaters in the June adinlnlstration» 
the effect is now studied in terms of the conversion scores. / 

Applying the combined data to the equating equations (2) and (5) » 
the conversion values Aj^ and obtained are Aj^ » 1.0728 and Bj^ = -9.9736 
for Intact groups of January and June. The mean value for the June group 
(X « 90) » become.: 100 after conversion^ The conversion was calculated 
by the linear addition of a constant to the January and June scores for 
the total test, t^ and tg, equating subtest, and Ug, and the remaining 
test Items, Y^^ and Yg. 

7 Applying only the data from nonrepeaters In similar fashion., the 
conversion valuies A2 « 1.0771, and Bj = -9.0419 are obtained. On this 
basis, the mean value for the June group of nonrepeaters (X « 89) equate'i 
to 100 after conversion, if one were to use the equated mean values ob- 
tained from the combined data as the cutting score for ••pass" and "fail" , 
the difference of one point translates "into "passing" an additional three 
percent of the. examinees (r subjects) of the 172 total. 

Extrapolating the extreme instance, where solely the repeater data 
is applied in the equating procediire, the conversion values A^ « 2.0951 
and B3 «-50.2374 are obtained. The mean value for the June :group of 
80.00 then becomes lOMO after conversion. The difference of 9 points now 
translates in "passing" an additional 29 examinees of the 172 total. 

It is now clear that the performance of repeaters tends to move 
the cutting score downward, a greater number of repeaters having a 
greater influence. It follows that it is to the advantage of those who 
are going to repeat an examination to do so at a time when their numbers 
^re great. 
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Discussion of Repeater Results 

In the January 1973 examination, 63 individuals received a score 
less than or equal to the passing score of IIS^ - The mean of the 63 
failing scores is 105.46. Of the 63 individuals vho scored less than 
115 in the January 1973 examination, 20 chose to repeat the examination 
in June* The mean of these 20 repeaters for the January 1973 examina- 
tion is 106* 00. It is obvious that these individuals were no different 
ijA their average score than the entire sample of individuals vho had 
failed the examination in January. One cannot, therefore « hypothesize 
that these individuals would repeat because their scores were signifi- 
cantly close to the passing point of 115 in January than the remaining 
group of individuals, the 20 individuals who failed the examination in 
January and chose to repeat the examination achieved a June score of 
115 •78, after the equating parameters had been applied, lliis June 1973 
examination score is approximately 10 points higher than the January 
1973 examination score achieved by the same group. Of the 20 repeaters 
that decided to take the June 1973 examination, 13 passed and 7 failed. 
A chi-square test of significance between CLUs and npn-CLUs who repeated 
the examination in June, along the dimensions of those who received a 
passing or failing score was not significant (p > .05, df =3). 

A continuing analysis of repeaters and nonrepeaters on the equating 
items and a random set of items selected for comparison between the 
January and June scores was also carried out. Results indicated that 
repeaters V performance on equating items , who were categorized according 
to whether or not they were a CLU or non-CLU (N « 11 and 9 respectively) . 



was not significant at the p < .05 level. The chl-square value for this 
2 X 4 mialysis indicates that CLUs and non-CLUs did not significantly 
change their responses on the equating items from the January to the 
June examination (x^ =2.79, df = 3). This leads to the tentative con- 
clusion that the equating items were reliable. 

To detetmine whether this result was occurring by chance, a random 
set of 20 items was selected from both the January and June examinations. 
The criteria for selection was that these items may not occur in the 
equating subset. The 20 repeaters' responses and changes from January to 
June -for these 40 items were recorded. By determining a frequency count 
of stability to these 40 items, a chi-square value was obtainable. The\ 
chi-square value of 9.101 was significant at p < .05 (df » 3). For 
this 2 X 4 analysis the results clearly indicate the high degree cf 
change in responses for the repeaters from January to Jt^e qn the random 
set of itemis that were selected. 

Similar chi-square tests were carried out using nonrepeaters, 20 
selected from the January examination and 20 from the June exanination 
on the 20 equating items. This analysis, as with similar analyses, was 
matched for the 11 CLUs and 9 non-CLU students. The 2 x 4 chi-square 
was significant at p < .01 (x^ » 20.14, df = 3). This result, illustrates 
the change in score patterns from January to June for the 40 nonrepeaters 
on the 20 equating items. The final comparison was made for the same 
40 nonrepeaters on 20 items randomly selected from both the June and 
January tests. Again, the chi-square was significant at the p < .01 
level for the 2 x 4 analysis (x^ « 14.60, df « 3). 
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An addltlohal analysis was also carried out for the repeaters and 
nonrepeaters on the equating and randomly selected items. This informal 
tion 9 presented in Table 2 » shows the percent change for CLUs and non-* 
CLUsy first from a right response to a wrong response, given the total of 
correct responses in the January administration; and secondly, from a 
wrong response to a right response given the total number of wrongs on 
the same test. This table illustrates these findings for the four groups 
mentioned above, that is, repeaters on the equating items and on the 
randomly selected items and nonrepeaters on both the equating items and 
randomly selected items. V ^ 

INSERT TABLE 2 HERE 

It is obvious that^ in comparing CLUs and non-CLUs on the percent 
switching from right to wrong response and wrong to right response on 
equating and randomly selected items, the greatest stability is achieved 
for the repeater group an the equating items. The average degree of 
switching for repeaters on the equating set of approximately 37 percent 
is) less, than any of the other three groups. 
Suggested Solution v 
: The common practice is to ignore the presence of repeaters and 
somehow vaguely assume a form of randomness has taken care of the problem. 
Giving some thought to the problem, a possible solution seems to lie in 
deriving equations which do not assume random groups. However, this 
sidessteps the problem rather than dealing with the presence of known 
repeaters. 

A practical solution of how to deal with repeaters is simply not to 
include their scores in the calculations. Thus, the assumption of 



randomness with respect to groups is not violated for ^his reason, and 
the conversion equations better reflect differences due to random 
sampling of groups* Subsequently, once the conversion equations are 
determined, the scores of the repeaters are subjected to adjustment in 
the same manner as those of the others. 

Summary 

The purpose of this investigation was to establish the effects of 
repeaters on test equating. Since consideration was not given to re- 
peaters in test equating, such as in the derivation of equatioTiS by 
Angoff (1971), the hypothetical effect needed to be established. A case 
study was examined whidi showed results on a test as expected, overall 
mean ims lower for repeaters. Applying these data to the available 
equating equations, it was shown that an additional three percent of the 
examinees was categorized as having "passed" than should If repeaters 
were taken into account. The practical solution offered is to hold 
separate the score of repeaters , execute the equating on the others and 
then apply the conversion to all the examinees. 
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Footnotes 



1 

Angoff, W.H., "Scales, Norms, and Equivalent Scores," Educational 
Measurement (2nd Ed.)t 1971, page 562. 

2 

All raw scores have been transformed (by addition of a constant) . 
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